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(54) Method and apparatus for flow control in a packet-switched computer system 



(57) This invention describes a link-by-link flow con- 
trol method for packet-switched uniprocessor and mul- 
tiprocessor computer systems that maximizes system 
resource utilization and throughput, and minimizes sys- 
tem latency. The computer system comprises one or 
more master interfaces, one or more slave interfaces, 
and an interconnect system controller which provides 
dedicated transaction request queues for each master 
interface and controls the forwarding of transactions to 
each slave interface. The master interface keeps track 
of the number of requests in the dedicated queue in the 



system controller, and the system controller keeps track 
of the number of requests in each slave interface queue. 
Both the master interface, and system controller a priori 
know the maximum capacity of the queue immediately 
downstream from it, and does not issue more transac- 
tion requests than what the downstream queue can ac- 
commodate. An acknowledgment from the downstream 
queue indicates to the sender that there is space in it for 
another transaction. Thus no system resources are 
wasted trying to send a request to a queue that is al- 
ready full. 
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Description 

Background of the Invention 

The present invention relates to a new method and 
apparatus for flow control in a packet-switched micro- 
processor-based computer system (uniprocessor), a 
network of such computer systems, or a multiprocessor 
system. 

Reliable and streamlined flow control of microproc- 
essor transactions (including data and/ or instructions) 
in a computer system is necessary to prevent wasted 
operations and assist the system to run efficiently. Typ- 
ical packet-switched bus interfaces in computer sys- 
tems use negative acknowledgments, or back pressure 
techniques, for flow control. This requires either the 
master issuing retried transactions due to negative ac- 
knowledgments by slaves, or complex handshake sig- 
nals to modulate the back pressure as a function of dy- 
namic changes in system resource availability. 

Both of these methods are complex to implement 
and increase the bus latency by spending extra clock 
cycles performing handshakes, re-arbitrating for the 
bus, re-scheduling interconnect resources, etc. 

Some systems avoid dynamic (negative) feedback 
(e.g. negative acknowledgments) altogether by making 
all the receive queues of a large enough size, so that 
they can contain the maximum number of transactions 
that can foreseeably be received by them. This has the 
disadvantage that is makes the slaves more expensive 
and inherently non-scalable, especially when there are 
several sources of requests with each making multiple 
requests. This approach can also aggravate the system 
latency, on account of the very deep receive queues. 

As the rate of processor speed improvements con- 
tinue to exceed that of memory speed improvements, 
reducing interconnect latencies becomes ever more im- 
portant to achieving substantial gains in performance. It 
would be especially useful to arrive at a system that 
avoids the use of negative feedback, to avoid the inef- 
ficiencies associated with that approach, without paying 
in complexity, size and hardware costs by inflating the 
sizes of the transaction receive queues. 

Various flow control techniques for packet-switched 
systems have been created, such as for telecommuni- 
cation switch designs and data communication proto- 
cols. Most extant designs use some form of dynamic 
feedback signal to throttle the sender, on either (1) a 
link-by-link basis where each downstream node sends 
a feedback signal to the upstream node, or (2) an end- 
to-end basis, where only the two end points of the com- 
munication participate in flow control, and the presence 
or absence of feedback from the destination is used by 
the sender to control its flow. Another method that has 
been proposed is rate-based flow control, where the 
sender meters its rate of send at an agreed-upon rate, 
and modulates it based on dynamic feedback signals 
sent to it by any point downstream. Most packet- 



switched computer systems use link-by-link flow control 
protocol. 

In link-by-link flow control, the downstream recipient 
asserts a feedback signal to the upstream sender, indi- 
5 eating a buffer-full or congestion condition at the recip- 
ient. The feedback signal can take one of several forms: 

(a) a pulse signal causing the sender to back off for 
some time and retry again; 

(b) a level signal causing the sender to not send, 
while the signal remains asserted; or 

(c) a stop-start two-message feedback from the 
destination to throttle and restart the sender respec- 
tively. (Most existing packet-switched system buses 
use this method.) 

Link-by-link flow control may also operate through' 
credits issued by the downstream recipient to the up- 
stream sender, where the sender only sends as many 
requests as the credits allow, and then stops until new 
credits are issued to it by the downstream recipient. The 
downstream recipient dynamically modulates the 
number and frequency of credits issued to the upstream 
sender, based on the recipient's receive queue capacity 
as well as any other congestion conditions detected by 
the recipient. For a description of such a system, see 
The FCVC Proposal for ATM Networks, Proc. 1993 In- 
ternational Conf. on Network Protocols, pp. 116-127, by 
H. T Kung et. al. 

Link-by-link flow control can also be accomplished 
by making the downstream queue large enough to ac- 
commodate the maximum number or items it is contem- 
plated ever to have to receive. Hence there is no loss of 
items (requests), and no need to throttle the upstream 
sender. This is an expensive, nonscalable solution. 

End-to-end flow control via absence or presence of 
feedback from the destination to the source is imple- 
mented in the TCP sliding window protocol for flow con- 
trol and congestion management, as well as in BECN/ 
FECN proposal for congestion control in ATM networks. 

One approach to flow control is to use a master de- 
vice to keep track of the number of transactions that it 
can issue with a predefined agreement of queue depths 
at its interface to the interconnect, and the interconnect 
issuing an ACK (acknowledgment) for each completed 
transactions as a means of flow control. In protocols that 
have used this approach (HDLC and SDLC), the sender 
has no information about the capabilities (notably, 
queue size) of the final recipient of the packet. In addi- 
tion, such protocols require that the queue depths be 
determined in advance, and hence are not scalable. Al- 
so, this approach does not lend itself well to a multiproc- 
essing environment, since each processor (or master) 
would have to keep track of all the transactions of all the 
other processors (or masters). 

In a credits-based scheme, credits are issued by a 
downstream recipient to allow new packets to be sent. 
Such a scheme has the disadvantages that: (1) credits 
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are issued as frequent and unsolicited feedback packets 
from one switch to another; and (2) credits are modulat- 
ed based on dynamic congestion conditions. 

Thus, a new apparatus and method are needed, 
which accommodate uniprocessor, multiprocessor and 
networked environments and provide for efficient, scal- 
able and cost-effective packet-switched transaction flow 
control. 

Summary of the Invention 

The present invention employs a method and ap- 
paratus for determining the total queue sizes of all the 
queues in the system at initialization time, and for per- 
mitting a master (e.g. a processor) to send a number of 
transaction requests only to the extent of that total. The 
system of the invention classifies system interconnect 
queues as request queues, read-data queues, and 
write-data queues, and determines rules of transfer of 
both requests and data as loosely coupled events. An 
interconnect (system) controller is connected between 
one or more masters (e.g. microprocessors) and the 
slave devices, which maybe I/O units, disk drives, mem- 
ory, etc. The interconnect controller includes a queue 
for each master, and each master includes a transaction 
counter indicating the number of outstanding transac- 
tion requests from that master to the controller. The in- 
terconnect controller additionally includes both request 
and write data queues for each downstream slave, and 
a transaction counter indicating the number of outstand- 
ing requests from the controller to that slave and the out- 
standing write data transfers from some master to that 
slave. 

The masters and the controller are prevented from 
issuing any transaction requests (or to initiate a write- 
data transfer requests) downstream when the respec- 
tive counter indicates that the corresponding request or 
data queue downstream is full. When a transaction is 
complete (e.g. upon the receipt of requested data read 
or consumption of write data by the slave), the relevant 
counter is decremented to indicate the availability of a 
place in the transaction queue or write-data queue. 

Queue overflow and congestion conditions are thus 
avoided by prohibiting the master or system controller 
from sending more transactions or data than the recip- 
ient has space for. A hardware handshake is used both 
to signal completion of a data transfer and to notify the 
master of one more available space in the downstream 
queue. The handshake is thus not an unsolicited signal, 
as in a credits-based scheme, and the signals are not 
based upon dynamic congestion. 

The maximum queue sizes in the system are deter- 
mined at initialization, and thus are known before exe- 
cution of any applications by the master(s). The masters 
and controller thus have at all times information on the 
number of available spots in the queue immediately 
downstream - to be contrasted with a credits-based 
scheme, where the maximum queue sizes are not 



known a priori, and the sender can only keep track of 
the credits issued to it. The initialization sequence is 
software-driven (e.g. by a boot PROM), and the queue 
sizes and depths are determined by this sequence, 
s which provides adaptability of the system to reconfigure 
it for different slaves (having different queue sizes) or to 
. configure out nonfunctioning queue elements. 

The elimination of (advance or overflow) feedback 
signals in the present flow control system reduces the 
10 interface latency, since there is no extra handshake, no 
rescheduling or rearbitrating for resources, and no re- 
tries by the master. Hence a simpler design is usable, 
which is easily scalable according to the number of proc- 
essors, and the slave queues can be downsized as de- 
is sired for price/performance considerations and desired 
bandwidth, without fear of losing any transactions due 
to smaller queue sizes. Furthermore, a variety of sys- 
tems ranging from small/inexpensive to large/expensive 
can be designed from the same modular CPU and I/O 
interfaces by simply down- or up-scaling (sizing) the re- 
spective queues and buffers in the interconnect, as de- 
sired. Since the interconnect controller is custom-de- 
signed to accommodate a given set of masters and 
slaves with a given range of queue sizes, the masters 
and slaves needn't be redesigned at all. Because the 
system controller is relatively inexpensive, a number of 
different controller designs can be utilized without ap- 
preciably raising the cost of the system - which would 
not be the case if the processors and slave devices 
needed modification. 

The overall system design, and effort required to 
test and validate correct system behavior under satura- 
tion conditions (when flow control is important) is also 
greatly simplified. 

Brief Description of the Drawings 

Figure 1 is a block diagram of a preferred embodi- 
ment of a computer system incorporating the present 
invention. 

Figure 1 A is a block diagram of a more generalized 
embodiment of the a computer system incorporating the 
invention. 

Figure 2 is a more detailed diagram of a portion of 
the system shown in Figure 1 . 

Figures 3A-3B together constitute a flow chart illus- 
trating a generalized implementation of the method of 
the invention. 

Figures 4-7 are block diagrams illustrating transac- 
tion flow control according to the invention for different 
types of transactions. 

Description of the Preferred Embodiments 

Figure 1 is a top-level block diagram of a computer 
system 10 incorporating the present invention. This di- 
agram relates to a specific implementation of applicant's 
new Ultrasparc Architecture, which is described fully in 
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the document UPA Interconnect Architecture, by Bill van 
Loo, Satya Nlshtala and Zahir Ebrahim. Sun Microsys- 
tems, Inc.'s internal release version 1.1 of the UPA In- 
terconnect Architecture has been submitted as Appen- 
dix Atoa related patent application by applicant, entitled' 
'Method and Apparatus for Flow Control in a Packet- 
Switched Computer System", by Ebrahim etal. That pat- 
ent application, filed in the United States Patent Office 
on March 31 , 1 995, describes many of the broader fea- 
tures of the UPA architecture, and is incorporated herein 
by reference. 

The present invention uses a new system intercon- 
nect architecture and concomitant new methods for uti- 
lizing the interconnect to control transaction requests 
and data flow between master devices and slave or 
memory devices. 

In Figure 1 , the system 10 includes a UPA module 
20 and an interconnect network or module 25, which in 
different embodiments of the invention may or may not 
be connected to the data path. The UPA module may 
include such devices as a processor 30, a graphics unit 
40, and an I/O unit 50. Other units may be included, and 
act as the master units for the purposes of the present 
invention. A master interface is defined as the interface 
for any entity initiating transaction requests; examples 
of such masters are a CPU making memory requests, 
or an I/O channel and bridges making DMA requests. 

In general, in this application a master is exempli- 
fied by a processor. However, a master may be any 
transaction-requesting device, whether or not it includes 
a microprocessor. Similarly, a "slave" refers herein to 
any device that can accept a transaction request, includ- 
ing both memory and non-memory devices, etc., and in- 
cluding devices such as processors and I/O controllers 
that may themselves act as masters. 

For the purposes of this invention, a "transaction" 
may be defined as a request packet issued by a master, 
followed by an acknowledgment packet (not necessarily 
a full packet, depending upon the chosen implementa- 
tion) from the recipient immediately downstream. There 
may or may not be a data transfer accompanying a re- 
quest packet, and the data transfer may either occur on 
the same set of wires as the request packet, or on sep- 
arate datapath wires. This is described in greater detail 
below in connection with Figures 4-7. 

A UPA port 60 couples the module 20 to a system 
interconnect controller (SC) 70, which is in turn coupled 
to one or more slave interface(s) 80. The slave interface 
may be an interface for memory (such as main memory), 
an I/O interface, a graphics frame buffer, one or more 
bridges to other interconnection networks, or even a 
CPU receiving transactions to be serviced. In general, 
any device that accepts transaction requests for servic- 
ing may be given a slave interface 80 in accordance with 
the invention, such as conventional memory device(s) 
85 and/or standard I/O device(s) 95. 

In a preferred embodiment, the system controller 70 
and UPA interface 60 are carried on the main processor 



chip, and the slave interface is on the motherboard, 
though many variations are possible. More generally, 
each master (whether a processor or some other de- 
vice) has a UPA master interface, and each slave in- 

5 • eludes a UPA slave interface. The system controller in 
each case resides with the system. 

A datapath crossbar 90 is also included in the inter- 
connect module 25, and is coupled to the slave interface 
(s), the system controller 70, and the ports 60. The da- 

10 tapath crossbar may be s simple bus or may be a more 
complicated crossbar. (The UPA ports 60 may be con- 
figured as part of either the UPA module 20 or the inter- 
connect module 25.) The datapath unit 90 is used to 
transmit read and write data in a manner to be described 

is below. 

One or more conventional memories or other data 
storage devices 85 and one or more input/output (I/O) 
devices 95 forming part of the system 10 are provided 
for user interface, data output, etc.; these various slave 

20 devices may include RAM, ROM, disk drives, monitors, 
keyboards, track balls, printers, etc. They are coupled 
into the interconnect module 25 via the slave interfaces 
80. The "slave" designation means in this case only that 
such devices accept requests from one or more proces- 

25 sbrs and fulfill those requests. 

The interconnection network may in general take 
the form of a number of different standard communica- 
tion topologies that interconnect masters and slaves, 
such as a point-to-point link, a single bus or multiple bus- 

30 es, or switching fabrics. The interconnect may employ 
any of a number of conventional mechanisms for switch- 
ing the transaction request to a slave using one or more 
signal paths, and the switching may be based either on 
the addressing information contained in the transaction 

35 request packet, or on another protocol not necessarily 
dependent on the contents of the request packet. There 
may be any amount of buffering, or no buffering, in the 
interconnect. 

The preferred embodiments) of the invention 

40 shown in Figure 1 (and Figure 1 A; see discussion below) 
has a centralized controller that connects to all masters 
and all slaves, and consequently has complete visibility 
to system request and data traffic. An alternative em- 
bodiment involves the use of distributed controllers, in 

45 which case it is desirable to maintain visibility, and in 
certain designs a maximum-capacity queue size may be 
needed. 

Figure 1 A shows a more generalized block diagram 
of a system according to the invention. Here, there are 

50 multiple masters (three exemplary masters M1 -M3 be- 
ing shown). These masters may act in certain circum- 
stances as slaves. For instance, if M1 is a processor 
and M3 is an I/O controller, then M3 will often act as a 
slave to M1 , as in the initialization procedure described 

55 below. On the other hand, during a DMA (direct memory 
access) operation, the I/O controller M3 will act as a 
master to a memory device, such as any of one to many 
of memories represented as M1 ... M2 in Figure 1 A. 
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Slave devices S1 ... S2 (which may be one, several 
or many slave devices) are also provided, and the mas- 
ters, memories and slaves are coupled via a system 
controller 75 in the same fashion as the system control- 
ler 70 is coupled to the master and slave(s) in Figure 1 . 
The SC 75 is coupled via a datapath control bus 77 to 
a datapath crossbar (or interconnect) 92, as with the da- 
tapath crossbar 90 in Figure 1. The control bus 77 will 
typically be much narrower than the system or data bus- 
es; e.g. in a preferred embodiment of applicant's sys- 
tem, the datapath is 72 or 144 bits wide, while the SC 
datapath control bus may be only 8 bits wide. 

As indicated above, the SC 75 has complete visibil- 
ity to all masters, slaves, and memory. The system con- 
troller need nor be on the datapath, but should have con- 
trol over and visibility to the datapath. 

The SC, masters, memories and slaves in Figure 
1 A are interconnected by address/control (A/ctrl) lines 
as shown, which may be unique (dedicated, point-to- 
point links) address/control lines or may be bussed to- 
gether. Data may also be bussed or switch-connected. 
Address/control and data lines/buses may share the 
same links, such as by providing shared address/data 
buses. 

A boot PROM 94 is connected by a bus to the I/O 
controller M3, which reads it upon start-up to initialize 
the system in a conventional manner (e.g. to initialize 
the CPU, registers, etc.), and in addition to initialize the 
queues, registers and counters of the present invention. 
The initialization procedure is described in detail below 
relative to Figure 4. 

Figure 2 illustrates an interconnect module 100 in a 
specific implementation where two master interfaces (or 
■masters') 110 and 120, a single system controller (SC) 
130, two slaves interfaces (or "slaves") 140 and 150, 
and a datapath crossbar 155 are used. There may in 
principle any number of masters and slaves. The mas- 
ters may be any of the interfaces discussed above, or 
in general any devices or entities capable of issuing 
transaction requests. 

Each slave 140 and 150 includes a slave queue 
(1 60 and 1 70, respectively) for receiving transaction re- 
quests. The maximum sizes of these slave queues are 
represented by values in port ID registers 180 and 190, 
respectively. 

Masters 110 and 120 include data queues or buffers 
115 and 125, and slaves 140 and 150 include data 
queues or buffers 1 85 and 1 95, whose functions are de- 
scribed in detail relative to Figures 6 and 7 bebw. The 
maximum sizes of the slave write data queues 185 and 
195 are also represented by values in port ID registers 
180 and 190, respectively. In the special case where 
there is a one-to-one correspondence to a request 
queue entry (e.g. 1 00) and a data buffer in the write data 
queue (e.g. 185), with the write data queue being max- 
imally dimensioned to hold an entire packet (i.e. dimen- 
sioned such that it can hold the largest contemplated 
packet size), then the queue size in 180 can be repre- 
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sented by a single number. 

In Figure 2 the slaves 140 and 150 may be any of 
the slave devices described above with respect to slave 
80 in Figure 1 , and in particular slaves 140 and 150 may 
5 represent any number of memory or non-memory slave 
devices. 

Initialization Operation 

The basic steps that take place at initialization in- 
clude: 

(1) determine the sizes of the respective receive 
queues of all the slaves coupled to the system; 

(2) store the sizes of the slave receive queues in 
registers within the system controller; 

(3) determine the sizes of the system controller's 
receive queues; and 

(4) store the sizes of the system controller receive 
queues in predetermined registers in the master(s). 

Thus, at system initialization, the initialization soft- 
ware reads the contents of the size fields for the request 
queue and write data queue in each slave, and then cop- 
ies these values into corresponding fields inside the 
configuration (config) register 200 of the SC 1 30. In one 
embodiment, the values in ID registers 170 and 180 
(representing the slave queue sizes) are stored in sep- 
arate fields in configuration Cconfig") register 200 of the 
SC 130. In addition, the values of the SCID registers 
255 and 265 (representing the SC queue sizes) are 
stored in config registers 270 and 280, respectively, of 
the master interfaces 110 and 120. 

Alternatively, config register 200 may be replaced 
by a separate configuration register for each UPA port 
implemented in the given SC. In this case, there would 
be two separate config registers, one for each of slaves 
140 and 150. 

The masters 110 and 120 also include transaction 
request output queues 290 and 300, respectively, which 
are used to queue up transaction requests from each of 
the master interfaces to the SC 130. Each master 110 
and 120 has a counter (310 and 320) used to track the 
number of pending transaction requests, as described 
below. 

TheSC 130 is provided with output queues 210 and 
220 and associated counters 230 and 240, respectively, 
whose operation will be described below. 

The SC 130 also includes an SC instruction (or 
transaction request) queue (SCIQ) for each master, so 
in this case there are two SCIQ's 250 and 260. Associ- 
ated with each SCIQ is an SCID register, namely regis- 
ters 255 and 265, respectively, containing a value rep- 
resenting the maximum size of the associated SCIQ. 

The circuitry for carrying out the operations of the 
SC is indicated by SC logic module 245 in Figure 2, and 
may include conventional hardwired and/or software 
logic for carrying out the necessary functions. For in- 
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stance, an ASIC may be provided for carrying out the 
transaction request handling, queue control, numerical 
comparisons and counting, etc. that are used in the in- 
vention. Alternatively, a general purpose processor 
could be used, configured (such as by program instruc- s 
tions stored in an associated conventional memory, e. 
g. ROM or RAM) to execute the functions discussed 
herein. 

Many combinations of standard hardware and soft- 
ware are possible to execute these functions in the sys- 10 
tern controller; and the same is true of the functions car- 
ried out in the slave devices (see slave logic modules 
142 and 152) and the master devices (see master logic 
modules 112 and 122). Here, the logic modules repre- 
sent all of the circuitry, programming, memory and intel- is 
ligence necessary to carry out the functions of the in- 
vention as described; assembling the hardware and 
software to do so is a matter of routine to one skilled in 
the art, given the teaching of this invention. (Where a 
master device is a processor, the logic for implementing 20 
the present invention can of course be made up in large 
part of the processor itself and the instructions it exe- 
cutes.) The particular implementation of these logic 
modules, and the extent to which it is represented by 
software or hardware, are widely variable and thus 2S 
shown only in block form in Figure 2. 

The initialization sequence will now be described 
with reference to Figures 1 A and 2 (for the architecture) 
and Figures 6-7 (for the flow control of the initialization 
sequence). The instructions for the initialization se- 30 
quence are stored in nonvolatile memory, in this embod- 
iment in the boot PROM 94. The processor M1 has a 
fixed address to the boot PROM 94, and accesses it by 
a read request over address/control line A/ctri-1 to the 
SC 75. The SC sends the request via the datapath con- 35 
trol line or bus 96 (which may be an 8-bit bus) to the 
datapath crossbar 92, which in turn accesses the. I/O 
controller M3. The I/O controller thus acts as a slave to 
the processor M1 in this operation. 

(It should be noted throughout the present descrip- 40 
tion that for the sake of clarity split address and data 
buses are assumed and illustrated; however, the 
present invention is equally applicable to systems using 
shared address/data buses.) 

The I/O controller M3 accesses the boot PROM 94 45 
to read the code for the initialization sequence, and 
sends it via line A/ctrl-3 to the SC 75, which sends it on 
to the processor M1 . 

In Figure 1A, the SC 75, masters M1-M3, slaves 
S1-S2 and memories Mem1-Mem2 include the config so 
registers, counters, SCID registers, ID registers, master 
queues, SCIQ's, and slave queues as depicted in Figure 
2; however, for the sake of clarity these elements are 
not shown in Figure 1 A. 

Once the processor M3 has retrieved the initializa- ss 
tion sequence instructions from the boot PROM 94, they 
are executed. The first operation is to read the ID reg- 
isters of the memories and slaves. These ID registers, 



as described above with respect to Figure 2, contain the 
values of the respective slaves' instruction queues and 
write data queues. The flow control sequence that is fol- 
lowed for this read operation follows that described be- 
lowforthe Slave Read Flow Control in Figure 6, the data 
from the ID registers being retrieved via a data bus (or 
datapath) 715. 

The ID register values are written to the config reg- 
isters (such as config register 200) of the system con- 
troller (75 in Figure 1 A, 130 in Figure 2). As discussed 
above, there is one config register per slave, or at least 
one field in a config register for each slave. The flow 
sequence followed for this write operation is as dis- 
cussed below relative to Figure 7. the I/O controller for 
the system is used for this purpose. Thus, assuming in 
Figure 7 that for this operation the slave 710 is the I/O 
controller, the master (in this case, a processor) 700 
causes the SC 720 to write the ID register values from 
each slave to its own config registers. In each case, the 
respective ID register value is stored in a buffer of the 
processor (master 700 in Figure 7 or master M1 in Fig- 
ure 1 A), and this value is passed to the system controller 
to the I/O controller (slave 710 in Figure 7 or master/ 
slave M3 in Figure 1A), which then writes it right back 
to the system controller via the datapath provided for 
that purpose (data bus 715 in Figure 7). 

The next step in the initialization procedure is to 
read the sizes of the receive queues of the system con- 
troller (e.g. the SCIQ's 0 and 1 shown in Figure 7 or 
SCIQ's 250 and 260 in Figure 2). The receive queue 
sizes are stored in the SCID registers (see registers 255 
and 265 shown in Figure 2). This read operation is ex- 
ecuted using the I/O controller of the system, resulting 
in the master/processor storing the SC receive queue 
values in a buffer or preassigned register. 

Finally, these SCIQ sizes are written into the master 
config registers (such as 270 and 280 in Figure 2). If the 
system is a uniprocessor system, then this amounts the 
processor writing the SCI D values to its own config reg- 
ister and to the config registers of other devices that can 
act as masters. If it is a multiprocessor system, then one 
processor acts as a master and writes SCID values to 
both its own config register and to those of the other 
processors. 

General Operation of Flow Control 

Below is a generalized description of transaction re- 
quest flow control in the present invention, followed by 
a more specific description of the preferred embodiment 
of the invention including details as to the initialization 
sequence and flow control for specific types of transac- 
tion requests. 

After initialization of the config register 200 in the 
SC 1 30 and the config registers 270 and 280 in the mas- 
ters, normal operation of the system 100 can com- 
mence. During operation, the SC maintains in its config 
register 200 a copy of the respective values of the slave 



6 



11 



EP 0 735 476 A1 



12 



ID registers 180 and 190, and hence "knows" the max- 
imum number of transaction requests that each slave 
interface can handle in its slave request queue (160 or 
1 70), and the maximum amount of data that can be held 
in its slave data queue (185 or 1 95). At any given time, 
the counters 230 and 240 store the number of pending 
transaction requests in the corresponding slave request 
queue, and the size of the pending store data in the 
slave store data queue. Unissued transaction requests 
may in some circumstances be stored for the slaves 1 40 
and 150 in output queues 210 and 220, which may be 
arbitrarily large, and in particular may be larger than the 
SCIQ's 250 and 260. In other circumstances, requests 
remain enqueued in corresponding SCIQ's. 

When a master, e.g. master interface 110, has a 
transaction request to issue, it first compares the value 
in its counter 310 with the value in the config register 
270. If the counter value is less than the config register 
value, then the request may be issued. The request is 
sent from the master's output queue 290 to the SCIQ 
250, and the counter 310 is incremented by one. 

The SC 130 then determines to which of the two 
slaves 140 and 1 50 the transaction request is destined, 
and checks the counter for that queue. For instance, if 
slave 1 40 is the destination for the transaction request, 
then the SC 1 30 checks the counter 210 and compares 
the value stored there with the value in the config reg- 
ister 200 corresponding to the ID register .180. If the 
counter 230 value is less than the value stored in the 
config register, then the SC 130 issues the transaction 
request and increments the counter 230. Otherwise, the 
transaction request is maintained in the output queue 
210. (In some transactions related to ordering con- 
straints for transactions for different requests from the 
same master, it may be desirable to leave the request 
in the SCIQ 250.) 

Assuming the transaction request is issued in this 
example, then the SC 1 30 sends a signal to the master 
1 1 0 to this effect (upon completion of the transaction, e. 
g. the transfer of data) and removes the transaction re- 
quest from its input queue 250 (upon sending of the re- 
ply). The master 1 1 0 accordingly decrements its counter 
310, which enables it to issue an additional pending 
transaction request. If the counter 310 was at its maxi- 
mum (indicating that the SCIQ 250 was full), the decre- 
mentation of the counter 310 allows room for a single 
additional transaction request from the master 1 1 0 to the 
SC 1 30. If the counter 31 0 was not at its maximum value, 
then the decrementation of the counter 31 0 simply adds 
one to the number of transaction requests available to 
the master interface 110. 

The output queues 210 and 220, which may be ar- 
bitrarily large in size (and in particular may be much larg- 
er, if desired, than SCIQ's 250 and 260 and slave input 
queues 160 and 170) are preferable but not necessary 
to the operation of the present invention. If separate out- 
put queues are hot kept for the two slaves (queue 210 
for slave 1 40 and queue 220 for slave 1 50), or if ordering 



constraints for the master prevent the use of both 
queues 210 and 220, then the transaction requests 
stored at queues 250 and 260 must wait until the respec- 
tive destination slaves can accept them before being 

5 cleared out of their queues. 

Such ordering constraints in the system may be glo- 
bal ordering requirements. That is, in a particular system 
it may be required that a pending transaction in queue 
210 from master 110 (intended for slave 140) be proc- 

10 essed before a succeeding transaction from master 1 1 0 
intended for slave 150 can be processed. 

Aside from such an ordering requirement, or as- 
suming the pending transactions in SCIQ's 250 and 260 
are from different masters, then either of these queues 

is 250 and 260 can release a request for either slave 1 40 
and 1 50 via the SC output queues 21 0 and 220, thereby 
allowing an increase in throughput. For instance, a slave 
1 40 request in SCIQ 260 can be sent to SC output queue 
210 even if slave 140 is full (i.e. its input queue 170 is 

20 full), and a succeeding slave 150 request from SCIW 
260 can then be sent to slave 150. If the SC output 
queues were not provided, then the slave 150 request 
would have to wait for slave 140 to clear before being 
issued. The SC output queues thus provide truly inde- 

25 pendent operation of the two slave interfaces. 

The SCIQ's 250 and 260 are independent of one 
another, as are the master interfaces and their respec- 
tive counters. Thus, the SC 1 30 is configured to handle 
a predetermined number of requests from each of the 

30 masters, with the number of requests that can be ac- 
cepted from each master being independent of the other 
(s); that is, the sizes of the SCIQ's are independent of 
one another. In addition, it is possible that an individual 
master may be capable of multiple requests independ- 

35 ent of others from that master, so the request queue 290 
(or 300) and corresponding SCIQ 250 (or 260) can in 
each case be split into multiple queues. 

Any master can request transactions to any slave 
via the SC, for any selected number of master and 

40 slaves. The SC will typically be an ASIC configured for 
a given system with a predetermined maximum number 
of master and slave interfaces. Since it is a relatively 
simple and inexpensive ASIC (by comparison with the 
system as a whole), it provides great flexibility and econ- 

45 omy by allowing designers to easily configure different 
SC's at tow cost for different systems, each one tailored 
to the specific needs of that system. 

The logic for the carrying out of the invention is pro- 
vided by hardware/firmware logic of the SC ASIC and 

50 the master and slave interfaces, and by program instruc- 
tions stored in memory 85 of the system, as shown in 
Figure 1. Alternative embodiments may implement the 
logic in other fashions, e.g. by providing memories and 
general purpose processors to carry out any of the steps 

55 executed by the master interfaces, system controller 
and slave interfaces of the preferred embodiment of this 
invention. 
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Operation of the System Controller 

Referring now to Figures 3A-3B, at initialization 
(box/method step 400) all UPA port I D registers (e.g. the 
slave ID registers 180 and 190 in Figure 2) are read, 
and their contents are written into the appropriate fields 
in the SC config register 200 (or into separate, dedicated 
confjg registers, as discussed above). The separate 
fields in a single SC config register embodiment is more 
likely to be used when the UPA (slave) ports are config- 
ured with a PROM instead of a port ID register. In the 
present application, whenever fields of the config regis- 
ter are referred to, it may be taken alternatively to mean 
separate config registers, and vice versa. 

At box 410, the master registers are now initialized, 
which involves reading the SCID registers 255 and 265 
and writing the SCIQ sizes (stored in those registers) in 
the respective config registers 270 and 280. 

Since at start-up the config registers 200 fields and 
the config registers 270-280 must allow at least one 
transaction apiece (to read their corresponding ID reg- 
isters 180-190 and 250-260, respectively), they are ini- 
tialized to a value of "1 ' to begin with, to ■bootstrap' the 
start-up. Then, when the read-ID-registers transaction 
requests are issued, and the counters are decremented, 
the requests will be allowed. (If the config registers were 
all at 0, no transactions would be allowed.) Upon the 
reading of the respective ID registers, the config register 
values are replaced with the correct values, i.e. the ac- 
tual sizes of their associated ID registers: 

At box 420, it is determined whether a new transac- 
tion request is pending in one of the masters, e.g. the 
master 120. If not, the procedure stops at box 422 (but 
may recommence at box 420 at any time that a new 
transaction request is made). 

At box 424, if the pending transaction request is for 
a read operation, then the system determines whether 
the master read data buffer (discussed in greater detail 
below) for the master interface is ready to accept data, 
i.e. whether there is sufficient room in the master read 
data buffer to receive the data to be read. If not, then 
the system waits as at box 426 until the master read 
data buffer is ready. Note that a write operation need not 
be held up during a wait period for a read operation, but 
may proceed independently; and vice versa. 

For a write operation, the system determines 
whether the data to be written to one of the slaves via a 
slave interface or memory is in fact ready for writing in 
(transmission to) a master write buffer. If not, again at 
box 426 a wait period is executed, until the data item is 
ready for writing. 

When either the read or the write operation is ready 
for execution as far as the master interface is con- 
cerned, then at box 430 the system tests whether the 
value of the master counter (in this example, counter 
320) or equal to the value stored in the config register, 
i.e. the size of the associated SCIQ 260 (as originaily 
provided by the SCID register 265). (The master counter 



should never be able to exceed the value stored in the 
config registers, but in case it did this could be taken 
into account by using a ">" instead of in the compar- 
ison test.) If the counter has not issued requests equal 
5 to the total SCIQ 260 size, then this test will be false and 
the method proceeds to box 440. 

If the counter value has reached its maximum al- 
lowable value, then the transaction request will not be 
passed on to the SC, and the method proceeds to box 
10 500. In this case, the transaction request pending in the 
master interface is required to wait (box 51 0) until a com- 
plete-transaction signal has been received from the SC 
before it can be issued. In a preferred embodiment, this 
complete-transaction signal takes the form of an 
15 S_REPLY signal, discussed in detail below with respect 
to Figures 4-7. 

When this complete-transaction signal is received 
by the master interface 110 (box 500), the master inter- 
face decrements the counter associated with that SCIQ 
20 (box 530) and proceeds to the step at box 440. 

At box 440, the counter 320 is incremented by one, 
and at box 450 the transaction request is sent by the 
master to the SC. Thus, the counter 320 now reflects 
the sending of one (or one additional) transaction re- 
25 quest. 

It will be appreciated that boxes 420-450 and 
500-520 all relate to method steps that are carried out 
by or in the master or master interface, such as master 
interfaces 110 and 120. It will be seen below that boxes 

30 452-458 and 462-490 (i.e. almost all of Figure 3B) relate 
to method steps carried out in or by the system controller 
(SC). Boxes 460 and 495 relate to the method steps of 
reading and writing data as appropriate. 

The SC is provided with intelligence and/or logic 

35 (hardware and/or software) to determine whether it has 
a transaction request pending in its request receive 
queue (such as SCIQ's 250 and 260). If so, then at box 
452 the transaction request at the head of the queue is 
examined to determine which slave is intended as the 

40 recipient for the request. This queue control and recipi- 
ent determination is carried out in a conventional man- 
ner. 

At box 454 (Figure 3A), the method determines 
whether the pending operation is a memory operation 
45 or a non-memory slave operation. If it is a memory op- 
eration, then at box 456 the method determines whether 
the recipient memory is valid, given global or other or- 
dering constraints. 

Some such possible constraints relate to delaying 
50 the processing of a memory or slave request by a given 
master until any requests to any other memory or slave, 
respectively, by that same master are resolved. That is, 
from a given master, e.g. master 1 , a series of transac- 
tion requests to slave 1 may issue, and then a transac- 
ts tion request may be issued for slave 2. A preferred em- 
bodiment of the present system ensures that all of the 
pending slave 1 requests (from master 1 ) are completed 
before the new slave 2 request is executed. This en- 
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sures any slave 1 action upon which the new slave 2 
transaction might rely will have taken place. Thus, 
strong global ordering of transaction requests from a 
given master with respect to requests issued to different 
slaves is maintained. This is accomplished by requiring 
the master to await a signal called S_REP from slave 1 
before issuing the slave 2 request, discussed below 

I n other systems, it may be preferable to allow mas- 
ter 1 to freely issue multiple request to slave 1 without 
awaiting an S_REPLY (transaction-complete) signal 
from slave 1. Even in such systems, there may be or- 
dering or other constraints upon transactions that can 
temporarily disallow given memories or non-memory 
slaves from accepting certain transactions, either of pre- 
determined transaction types or from particular masters, 
or both, or for some other reasons. 

If for any of these reasons the recipient memory is 
not valid or available at this time, then at box 458 the 
method waits until the memory is valid and available. 

If the recipient memory is valid, then at box 460 the 
data is read or written to/from memory as required, and 
the S_REPLY (transaction complete) signal is sent, as 
described in greater detail below 

If the pending transaction is a non-memory slave 
transaction, then at box 462 the method determines 
which slave is to receive the request. At box 464, it is 
determined whether the recipient slave is a valid recip- 
ient at this time, given the ordering or other constraints 
mentioned above. If not, at box 466 the method waits 
until the slave is valid. 

Once the slave is valid for this transaction, then the 
transaction request is moved into the corresponding SC 
output queue (SCOQ) 210 or 220. 

If the pending transaction is a slave write transac- 
tion, than at this time (box 470) the SC enables the da- 
tapath 1 55 via a datapath control signal, and the master 
(whose transaction is pending) is then able to move the 
data through the datapath to the appropriate slave input 
queue (185 or 195). The SC then sends its transaction- 
complete (SJ=IEPLY) signal to both the master and the 
slave (see discussion below relative to Figure 7). 

At box 475, the SC 1 30 then checks the counter for 
the recipient slave, e.g. counter 240 if slave 150 is the 
destination for the pending transaction request. If the 
counter equals or exceeds the value in the config reg- 
ister (i.e. the size indicated by the ID register 1 80 or 1 90, 
which were read at initialization), then the request is not 
yet allowed. In this case, then steps 530-550 are fol- 
lowed (essentially identical to steps 500-520), until a 
free line opens up in the destination slave queue to re- 
, ceive the transaction request. 

If the appropriate counter (230 or 240) has not 
reached its maximum allowed value, then it is incre- 
mented by one (box 480), and the transaction request 
is sent to the recipient slave (box 490). 

If the pending transaction is a slave read request 
then at this point (box 495) the read operation is initiated. 
When it is complete, the slave sends a P_REPLY to the 



SC, and the SC sends S_REPI_rs to both the request- 
ing master and the recipient slave. See the discussion 
below relating to Figure 6 below for details about the 
transaction and data flow for slave read requests. 

s At this point, the method then proceeds to box 420 
in Figure 3A, i.e. it is determined whether another trans- 
action request is made. 

The flow chart of Figures 3A-3B does not necessar- 
ily indicate a strictly linear sequence with respect to dif- 

10 ferent transactions (though for a given transaction the 
flow is linear); e.g. in preferred embodiments a transac- 
tion request can be allowed to issue from one of the 
master interfaces even as another transaction request 
is issued by the SC to a slave interface. Other types and 

*s degrees of parallel operation may be implemented. 

Flow control. 

Figures 4-7 illustrate how flow control takes place 
20 in the present invention for each of four different types 
of transactions: 

Figure 4: read operation from memory (i.e. where 
the slave interface is a memory interface; 
2S Figure 5: write operation to memory; 

Figure 6: read operation from a device other than 
memory; and 

Figure 7: write operation from a device other than 
memory. 

30 

Other operations, such as cached read transactions 
(which involve the snoopbus, not a focus of the present 
invention) are possible, but these will suffice to illustrate 
the features of the present invention. 
3S in Figures 4 and 5, for the sake of simplicity the 
queues and registers illustrated in Figure 2 are not 
shown, but should be understood to be included in both 
the master interfaces (UPA ports) and system controller, 
in essentially the same configuration as in Figure 2. 
40 Thus, the transaction control described above with re- 
spect to Figures 2 and 3 is accomplished also with re- 
spect to Figures 4-5, as well as 6-7. 

However, the memory banks shown in Figures 4 
and 5 need not include slave queues as shown in Figure 
45 2, nor does the system controller in Figure 4 need to 
include a config register and counters as in Figure 2; 
rather, conventional flow control as between a read- or 
write- transaction requesting device and memory may 
be utilized, and will be implementation-specific. Many 
so standard designs that ensure that read and write re- 
quests are properly meted out to the memory banks will 
be appropriate. In this example, steps (boxes) 470-490 
and 530-550 in Figures 3A-3B are replaced by equiva- 
lent steps for control of read and write transactions to 
55 and from the memories. 

In Figure 4, a specific embodiment of an intercon- 
nect module 600 is illustrated, where memory banks 
610... 620 are the slave devices, with a total of m mem- 
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ory banks being indicated by the subscripts (0) ... (m-1 ). 
There are likewise multiple master interfaces (UPA 
ports) 630 ... 640, in the present example 32 master in- 
terfaces being indicated by the subscripts 0 ... 31 . A da- 
tapath crossbar 625 couples the memory banks to the 
UPA ports in a conventional manner. 

As a rule, in this operation the order of reception of 
the transaction requests will be the order of reply by the 
slave interfaces. 

In general in Figures 4-7, the order of events is in- 
dicated by the circled event numerals 1,2,3 and 4 (with 
accompanying arrows indicating the direction of data or 
signal flow), as the case may be for each figure. With 
the exception of the fact that the memories in Figures 4 
and 5 do not include the slave queues and ID register 
of the slaves shown in Figure 2, the fol towing description 
of data flow with respect to Figures 4-7 should be un- 
derstood to include the steps described with respect 
transaction request control (see Figures 3A-3B). Thus, 
for each request issued, the appropriate counter con- 
sultation, incrementation and decrementation is carried 
out to determine that the request is sent at an appropri- 
ate time. The respective queues are also handled as ap- 
propriate. 

Memory Read Requests: Figure 4 

This read request example assumes that data is 
coming from memory, and not, e.g., from a cache. 
Snoop operations on the snoopbus shown in Figure 4 
are not in consideration here. 

Event 1 : When a UPA master port such as post 630 
has a read-from-memory transaction ready, and the 
master counter is not at its al towed maximum (see box 
430 in Figures 3A-3B), the read transaction is issued on 
the UPA_Addressbus from UPA port 630 to the system 
controller 650. This is indicate by the event 1 (P_REQ) 
along the UPA_Addressbus 660 in Figure 4, with the di- 
rection of the information indicated by the arrow, i.e. 
from the port to the SC. 

Event 2: The memory cycle [i.e. RAS (read-ad- 
dress-strobe )/C AS (column-address-strobe) request is- 
suance] is issued over memory control bus 670 to the 
memory banks 610... 620. See vent 2 ("RAS/CAS") 
along bus 670. 

Event 3: The datapath is scheduled by a signal 
along the datapath control bus 680, and data items are 
accordingly delivered from memory to the datapath 
crossbar 625 via a memory databus 690 and UPA dat- 
abus 700. This fulfills the read request. 

Memory Write Requests: Figure 5 

Figure 5 depicts the same circuit as Figure 4, but 
the flow is different because it relates to a (non-cached) 
write operation instead of a read operation. Event 1 is 
the issuance of the write request along the UPA address 
bus 660. 
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In event 2, the datapath control signal over the bus 
680 is sent to enable the datapath crossbar 625. Also, 
an S_REPLY is sent over bus 710 by the SC 650 to the 
UPA port 630 to source the data after the datapath is 
s scheduled, and the data items are sent from the UPA 
port 630 to the datapath crossbar over data bus 700. 
Here, they are buffered, in preparation for forwarding to 
the memory banks. At this point, the counter in the UPA 
port is decremented to show that another transaction re- 
quest is available to the system controller. 

In event 3, the memory banks are enabled via bus 
670 using RAS/CAS signals, and data items are sent 
via bus 690 to the memory banks. This completes the 
write operation. 

The foregoing method ensures that no write request 
is issued until the write data are ready. E.g., if the data- 
bus 695 is 144 bits wide, but the bus 690 is 288 bits 
wide, the data words are buffered in the crossbar, as- 
sembled into 288-bit blocks, and then written to memory. 

Siave Read Requests: Figure 6 

Figure 6 illustrates a read sequence to a slave de- 
vice other than memory, and is similar to Figure 2, but 
for this example a single master interface 710 and a sin- 
gle slave interface 720 are used, coupled by a system 
controller 720 and a datapath crossbar 730. 

Event 1 indicates the issuance of a read request 
P_REQ on UPA address bus 740 to SC 720. 

In event 2, the SC 720 sends the P_REQ on bus 
750 to the slave interface 710. To do this, if there are 
several slave interfaces, the SC must first decode the 
address to ensure that the P_REQ goes to the correct 
slave interface. Event 2 informs the slave interface to 
prepare the data to move through the datapath. 

When the data items are ready, then event 3 takes 
place, namely the sending of a P_REPLY from the slave 
71 0 to the SC 720 over bus 760. 

In event 4, a series of steps are executed to cause 
the master interface to receive the data: SC 720 sched- 
ules the datapath 730, and issues an S_REPLY over 
bus 770 to the master interface 700. In addition, the SC 
issues the S_REPLY over bus 780 to the slave 710, to 
drive the data, when it is ready, on the slave's UPA da- 
tabus 790 via the datapath and over the databus 800 to 
the master interface 700. 

Siave Write Requests: Figure 7 

Figure 7 shows the identical apparatus as Figure 6, 
but illustrates a write sequence from a non-memory 
slave interface to a master interface. This sequence en- 
sures that data cannot be transferred until the data 
queue PREQ_DQ of the slave interface 710 has suffi- 
cient space. 

In Figures 6 and 7, both a transaction request coun- 
ter 810 and a data queue counter 820 are shown in the 
SC 720. These are counters to determine how full the 
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.PREQ queue and PREQ_DQ queue (slave output data 
queue) are, respectively. If these two queues are of dif- 
ferent sizes, then their associated counters 81 0 and 820 
are of different sizes. If these two queues are the same 
size, then a single counter may be used in the SC to 5 
monitor how full both queues are. 

Event 1 : The first event of the write operation is that 
a PJ=tEQ is issued by the master interface 700 to the 
system controller 720 over bus 740. 

Event 2: In event 2, the SC issues the P_REQ over 10 
bus 750 to the slave interface 71 0. The P_REQ includes 
sufficient words to inform the SC how much data is being 
written. As mentioned above, the slave data queue 
counter 820 is used to track how full the data queue 
PREQ_DQ is. If the PREQ_DQ queue is too full, then *5 
the write transaction must wait. 

The data queue PREQ_DQ may be the width of one 
word (e.g. 1 6 bits) or a block (e.g. 64 bits). Multiple trans- 
fer sizes are thus supported in the current system. Pos- 
sible queue organizations include the maximum capac- 20 
ity per request, or some fraction of the maximum capac- 
ity per request, e.g. the 64-bit and 16-bit examples cited 
above. 

If the queue PREQ_DQ is sufficiently available, 
then the write operation may proceed. Further in event 25 
2, the SC schedules the datapath 730 with a datapath 
control signal "DP ctrr, and issues an S_REPLY to the 
master interface over bus 770 to drive the data on its 
data bus 800. In addition, the SC issues the S_REPLY 
over bus 780 to tell the slave interface 710 to receive 30 
the data over its data bus 790. 

The transaction is complete as far as the master in- 
terface is concerned once it has received the S_REPLY 
and the data has been transferred over the bus 800 to 
the datapath crossbar 730. TTius, at this point, even 35 
though the slave interface may not yet have received 
the data, the master interface is prepared for an addi- 
tional transaction. 

Since the address and data paths are independent, 
the request packet (which includes the destination ad- 40 
dress) and the corresponding data may be forwarded in 
any order to the slave port. That is, the data might ac- 
tually arrive at the input queue PREQ_DQ before the 
P_REQ arrives at the queue PREQ of the slave. If this 
happens, the data will have to wait until the P_REQ ar- 45 
rives, so that the slave can determine the destination 
address for the data. Alternatively, of course, the 
P_REQ may arrive first, and the data second, in which 
case it can immediately be written to the destination ad- 
dress specified by the P_REQ. so 

Event 3: Once the slave has cleared the requested 
data from its data queue and the transaction request 
from its input queue, it issues a P REPLY over bus 760 
to the SC, indicating that it is ready for another transac- 
tion. The SC decrements its counters 810 and 820 ac- ss 
cordingly. The transaction is now complete from the 
SC's point of view, i.e. there are no more actions to be 
taken by the SC. 



Transaction Ordering 

The transactions herein are any type of request by 
a master device or module (hardware, software, or a 
combination). These include read-data transfers, write- 
data transfers, etc., which must be connected with the 
read and write replies. That is, for example, each write 
request is logically linked to write data (i.e. data to be 
written). While in the foregoing description the ordering 
of data transfer has been assumed to be governed by 
the order of write requests, other possibilities exist. 

For instance, a link between a write request and the 
write data may be accomplished by assigning tokens to 
each write request and its corresponding data. The to- 
kens would then be used to inform the system controller 
and processor of the completion of a given write request; 
that is, the write data carries its token along with it, and 
when it is received the write request having the associ- 
ated token is known to be complete. Such a system re- 
quires token match logic to locate the associated to- 
kens. The token system can be applied to the system 
controller operation described above for any transac- 
tions requested by a master, and frees up the ordering 
requirement of transact ion request vis-a-vis completion 
of the requests; that is, read and write transactions may 
be carried out in an order other than the order in which 
they were issued. 

In any case, for each transaction there will be a cor- 
responding reply by the system controller, whether or 
not there is a data transfer. As a general matter, the or- 
der of events for various transactions will be: 

Read from slave: read request --> slave read reply 
-> read data transfer (optional) 
Write from master: write request --> SC write reply 
-> write data transfer (optional) 
Write from slave: (write request/write data trans- 
fer, in either order) -> slave reply when write data 
is consumed 

Thus, the present system is adaptable to many dif- 
ferent ordering schemes, or indeed to schemes such as 
a token system where no particular ordering is required. 

Claims 

1. A control system for controlling transaction flow in 
a computer system having at least one microproc- 
essor with at least one master device, a main mem- 
ory, and at least one slave device having a slave 
receive queue for storing transaction requests, the 
control system including: 

a first register storing a first register value rep- 
resenting the number of locations available for 
queueing of transaction requests by said at 
least one slave device; 
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a first counter storing a first counter value rep- 
resenting at any given time the number of trans- 
action requests that are stored in said slave re- 
ceive queue; 

an interconnect receive queue coupled to said 
at least one master device; 
a first circuit coupled to said first register, coun- 
ter and slave receive queue, including compar- 
ison logic to compare said first register value 
with said counter value, and adapted for trans- 
ferring said transaction requests from said in- 
terconnect receive queue to said slave device 
when said counter value is less than said first 
register value. 

2. The control system of claim 1 , wherein said at least 
one master device includes: 

a second register storing a second register val- 
ue representing the number of locations avail- 
able for queueing of transaction requests in 
said interconnect receive queue; and 
a second counter storing a second counter val- 
ue representing at any given time the number 
of transaction requests that are stored in said 
interconnect receive queue; 
a second circuit coupled to said second regis- 
ter, second counter and interconnect receive 
queue, including logic adapted to compare said 
second register value with said second counter 
value, and adapted for transferring said trans- 
action requests from said master device to said 
interconnect device when said second counter 
value is less than said second register value. 

3. The control system of claim 2, wherein said logic is 
further adapted to determine the size of said inter- 
connect receive queue and to store a value repre- 
senting said size in said second register 

4. The control system of claim 2, further including: 

a datapath circuit coupled between said master 
device and said slave device; 
wherein said logic is couple to said datapath cir- 
cuit and is adapted to enable said datapath cir- 
cuit for data transfers between said master de- 
vice and said slave device in response to read 
and write requests from said master. 

5. The control system of claim 1 , including logic adapt- 
ed to determine the size of each said slave receive 
queue and to store a value representing each said 
size in a said register corresponding to each said 
slave device. 

6. A method for controlling transaction flow in a com- 
puter system including at least one master device, 



at least one slave device, and a system intercon- 
nect connected between the master and slave de- 
vices, the interconnect having at least one intercon- 
nect request queue coupled to each said master de- 
vice and each said slave device including at least 
one slave request queue coupled to said system in- 
terconnect, the method including the steps of: 

(1 ) transmitting a first transaction request from 
a first said master device to a first said system 
interconnect; 

(2) receiving the first transaction request in a 
first said interconnect receive queue of said first 
system interconnect; 

(3) determining a recipient slave device for said 
first transaction request; 

(4) determining whether the recipient slave re- 
quest queue of the recipient slave device is 
available to accept said first transaction re- 
quest, and if so, forwarding said first transac- 
tion request to said recipient slave request 
queue; 

(5) receiving said first transaction request at 
said recipient slave request queue; 

(6) executing said first transaction request by 
said recipient slave device; 

(7) notifying the system interconnect, by the 
slave device, that said first transaction request 
has been executed. 

7. The method of claim 6, further including, prior to 
step 1 , the additional step of determining by said 
first master device whether said first interconnect 
receive queue can accept said first transaction re- 
quest. 

8. The method of claim 6, further including the steps 

of: . 

storing a pending request value representing 
the total number of transactions that said recip- 
ient slave request queue can receive; and 
maintaining a count of the number of transac- 
tion requests present in the recipient slave re- 
quest queue; 

wherein step 4 includes the step of comparing 
said count with said pending request value, to 
determine that the recipient slave request 
queue can accept the transaction request if 
said count is less than said pending request val- 
ue. 

9. The method of claim 8, further including, after step 
7, the step of the system interconnecting decre- 
menting said count 

10. The method of claim 6, including, before step 1 , the 
steps of: 
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determining the queue size of said slave re- 
quest queue for each slave in the system; and 
storing a value representing each said queue 
size in at least one register in said system in- 
terconnect. 5 

1 1 .. The method of claim 6, including, before, step 1 , the 
steps of: 

determining the queue size of each said inter- io 
connect request queue; and 
storing, in a register of each said master device, 
a value representing the queue size of the in- 
terconnect request queue coupled to that mas- 
ter device. is 

12. The method of claim 6, including, before step 1 , the 
step of: 

(8) determining whether said first interconnect 
request queue can accept the transaction request, 20 
and if so, forwarding the transaction request to said 
first system interconnect. 

13. The method of claim 12, further including, before 
step 7, the steps of: 2s 

storing a pending request value representing 
the total number of transactions that said first 
interconnect request queue coupled to said first 
master can receive; and 30 
maintaining a count of the number of transac- 
tion requests present in said first interconnect 
request queue; 

wherein step 4 includes the step of comparing 
said count with said pending request value, to 3S 
determine that said first interconnect request 
queue can accept said first transaction request 
if said count is less than said pending request 
value. 

40 

1 4. A control system for controlling flow of transactions 
in a computer system including a master device and 
a slave device, comprising: 

a slave receive queue in said slave device for 45 
receiving a predetermined maximum number of 
transaction requests; • 

a system interconnect including a receive 
queue for receiving transaction requests from 
said master device and including a counter for so 
maintaining a count of outstanding requests in 
said slave receive queue; and 
logic to determine whether the said count is less 
than said predetermined maximum number. 

55 

15. A control system for controlling flow of transactions 
in a computer system including at least a first mas- 
ter device, a first slave device and a second slave 



device, including: 

. an interconnect receive queue configured for 
storing a plurality of transaction requests is- 
sued by said first master device; 
a first output queue coupled to said first slave 
device and configured for storing a first prede- 
termined number of said transaction requests 
directed to said first slave device; 
a first slave receive queue configured for re- 
ceiving a second predeterminednumberof said 
transaction requests; 

a second output queue coupled to said second 
slave device and configured for storing a third 
predetermined number of said transaction re- 
quests directed to said second slave device; 
a second slave receive queue configured for re- 
ceiving a fourth predetermined number of said 
transaction requests; 

logic adapted to determine which of said first 
and second slave devices is the intended recip- 
ient slave device for each transaction request 
stored in said interconnect receive queue and 
to transmit each said transaction request to the 
output queue coupled to said determined recip- 
ient slave device for that transaction. 

16. The control system of claim 15, wherein said logic 
is further adapted to inhibit transmission of a trans- 
action request from said first master device to said 
second slave until all transaction requests from said 
first master device to said first slave device have 
been processed. 

17. The control system of claim 15, further including: 

a first counter storing a first count of the total 
number of transaction requests present in the 
first slave receive queue; 
wherein the logic is further adapted to inhibit 
transmission of a transaction request from said 
first output queue to said first slave receive 
queue when said count is at least equal to said 
second predetermined number 

18. The control system of claim 17, further including: 

a second counter storing a second count of the 
total number of transaction requests present in 
the second slave receive queue; 
wherein the logic is further adapted to inhibit 
transmission of a transaction request from said 
second output queue to said second slave re- 
ceive queue when said count is at least equal 
to said fourth predetermined number. 
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